Large CompositeSubscription performance improvements #1145

akarnokd · 2014-05-05T11:10:03Z

This is a proposition to improve the addition/removal speed of the CompositeSubscription in case of thousands of items in it.

In some operators, especially when using Schedulers and/or observeOn, thousands of items may be present in the composite and since adding/removing a new item is O(n), it takes more and more time to add and remove items once the composite gets large.

My proposal is to change the composite state implementation to switch to a HashSet representation above a certain threshold (and switch back below some). The following diagram shows some benchmarking results. (Configuration: i7 920 @ 2.4GHz, 32K L1 Data, 256K L2, 8M L3, 6GB DDR3 1333MHz RAM, Windows 7 x64, Java 8u05 x64.)

This benchmark how the CompositeSubscription behaves when there are some items already in it and a new unique item is added and removed immediately (all single threaded). The blue line indicates the current master implementation; the red, green and purple show the new implementation with thresholds set to 8, 16 and 24 respectively. Once the internal array size reaches the cache-line size, it is generally better to use HashSet instead.

The second benchmark compares how fast can the CompositeSubscription be filled with subscribers to a various capacity, and how the target size affects the fill speed.

When the composite is filled in, the one-by-one array resize performs better until a larger capacity is reached, but then again, using a HashSet to append further items is faster. Unfortunately, what seems to be an optimal threshold for the first case performs worse in this case.

There are some drawbacks of this hybrid approach:

The optimal threshold value depends on the use case and the target system. How can we enable to tune this parameter. Parameter tuning could be handy in other places in RxJava; how can we enable it?
I'm not 100% certain if I've correctly implemented the switchover from atomic states to mutable state and back.

cloudbees-pull-request-builder · 2014-05-05T11:15:42Z

RxJava-pull-requests #1066 SUCCESS
This pull request looks good

benjchristensen · 2014-05-05T19:28:33Z

I need more time to review this ... looks like impressive work and analysis!

Does this still make sense once #1000 is implemented and we can limit the buffering on operators like observeOn? It looks like the 1024 buffer size (a candidate for default buffer even with #1000) can still benefit from this.

akarnokd · 2014-05-05T19:34:35Z

It depends on the usage pattern. If there is a burst of 1000s of new subscriptions, a bigger threshold is better. If there are lots of small additions and removals while having a moderate number of items in the composite, the lower threshold is better. I don't know a case where suddenly 1000s of subscriptions get created and added to the composite.

akarnokd · 2014-05-12T08:16:58Z

This change has been incorporated into #1190 so I don't recommend merging this. I'll keep this open until the approach is approved or discarded.

benjchristensen · 2014-05-20T22:05:38Z

I tested these changes while playing with performance. As expected they don't play a role in normal map/flatMap use cases using this test: https://github.com/benjchristensen/RxJava/blob/performance/rxjava-core/src/perf/java/rx/usecases/PerfTransforms.java

master-performance branch

Benchmark                                       (size)   Mode   Samples         Mean   Mean error    Units
r.u.PerfTransforms.flatMapTransformsUsingFrom        1  thrpt         5  3184873.133   172320.420    ops/s
r.u.PerfTransforms.flatMapTransformsUsingFrom     1024  thrpt         5     9079.937      343.905    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust        1  thrpt         5  3411785.677    73767.161    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust     1024  thrpt         5    10860.963      294.309    ops/s
r.u.PerfTransforms.mapTransformation                 1  thrpt         5  7208334.997   703327.745    ops/s
r.u.PerfTransforms.mapTransformation              1024  thrpt         5    18720.797      278.529    ops/s

with CompositeSubscriptionSpeedup merged

Benchmark                                       (size)   Mode   Samples         Mean   Mean error    Units
r.u.PerfTransforms.flatMapTransformsUsingFrom        1  thrpt         5  3118494.120   173919.112    ops/s
r.u.PerfTransforms.flatMapTransformsUsingFrom     1024  thrpt         5     8715.267      298.710    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust        1  thrpt         5  3140414.540    17172.305    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust     1024  thrpt         5    10226.673      410.795    ops/s
r.u.PerfTransforms.mapTransformation                 1  thrpt         5  6913018.033   123193.805    ops/s
r.u.PerfTransforms.mapTransformation              1024  thrpt         5    18608.990      377.513    ops/s

reverted changes

Benchmark                                       (size)   Mode   Samples         Mean   Mean error    Units
r.u.PerfTransforms.flatMapTransformsUsingFrom        1  thrpt         5  3247111.390    63408.626    ops/s
r.u.PerfTransforms.flatMapTransformsUsingFrom     1024  thrpt         5     9103.650      223.604    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust        1  thrpt         5  3406320.200   121290.155    ops/s
r.u.PerfTransforms.flatMapTransformsUsingJust     1024  thrpt         5    10882.203      382.654    ops/s
r.u.PerfTransforms.mapTransformation                 1  thrpt         5  7353154.113   185266.016    ops/s
r.u.PerfTransforms.mapTransformation              1024  thrpt         5    17524.093      518.755    ops/s

akarnokd · 2014-05-21T06:13:39Z

But mine is a bit slower than the master and the 0.16.x. I think the reason is that by changing the size of objects, the allocator may run out of the cache line at different times.

akarnokd · 2014-05-30T06:28:18Z

I'm closing this because of the new subscription containers have these optimizations in separate classes.

akarnokd added 2 commits May 5, 2014 11:34

Large CompositeSubscription performance improvements

142c0f1

Threshold chosen for many add/remove.

d1fc876

This was referenced May 8, 2014

Scheduler Improvements 507 #1166

Closed

Scheduler fixes and related improvements #1190

Closed

akarnokd mentioned this pull request May 16, 2014

Profiling Memory Usage and Object Creation #1204

Closed

benjchristensen added this to the 0.19 milestone May 20, 2014

akarnokd closed this May 30, 2014

This was referenced May 31, 2014

Explore Blocking vs Non-Blocking Solutions #1299

Closed

0.19 - Potential Leak in CompositeSubscription with Executor? #1312

Closed

akarnokd deleted the CompositeSubscriptionSpeedup branch January 20, 2015 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Large CompositeSubscription performance improvements #1145

Large CompositeSubscription performance improvements #1145

akarnokd commented May 5, 2014

cloudbees-pull-request-builder commented May 5, 2014

benjchristensen commented May 5, 2014

akarnokd commented May 5, 2014

akarnokd commented May 12, 2014

benjchristensen commented May 20, 2014

akarnokd commented May 21, 2014

akarnokd commented May 30, 2014

Large CompositeSubscription performance improvements #1145

Large CompositeSubscription performance improvements #1145

Conversation

akarnokd commented May 5, 2014

cloudbees-pull-request-builder commented May 5, 2014

benjchristensen commented May 5, 2014

akarnokd commented May 5, 2014

akarnokd commented May 12, 2014

benjchristensen commented May 20, 2014

master-performance branch

with CompositeSubscriptionSpeedup merged

reverted changes

akarnokd commented May 21, 2014

akarnokd commented May 30, 2014